Two-Phase Evaluation Architecture by oshorefueled · Pull Request #55 · TRocket-Labs/vectorlint

oshorefueled · 2026-01-13T19:53:05Z

Two-Phase Evaluation Architecture

Summary

Splits LLM evaluation into two distinct phases—detection and suggestion—to improve output quality and reduce hallucinations.

Problem

The previous single-pass approach asked the LLM to simultaneously detect issues AND generate suggestions in one structured call. This led to:

Rushed analysis with shallow reasoning
Suggestions that didn't align with detected issues
Difficulty debugging which phase caused failures

Solution

Phase 1: Detection — Uses an unstructured LLM call with free-form markdown output. The model focuses solely on finding issues with detailed reasoning.

Phase 2: Suggestion — Uses a structured JSON schema call. The model receives the full document context and detected issues, then generates targeted suggestions for each.

Changes

Area	Files Changed
LLM Providers	Added `runPromptUnstructured()` to all providers
Detection Phase	New `DetectionPhaseRunner` with markdown parser
Suggestion Phase	New `SuggestionPhaseRunner` with Zod validation
Result Assembly	New `ResultAssembler` merges phases + aggregates tokens
Retry Logic	New `withRetry()` utility for transient failures
Base Evaluator	Updated to orchestrate two-phase flow

Stats

27 files changed, 6576 insertions(+), 158 deletions(-)
398 tests passing

Testing

Property-based tests for detection parsing accuracy
Property-based tests for suggestion-to-issue matching
Zod runtime validation on LLM responses
Token usage aggregation verification

Breaking Changes

None. Output format remains backward compatible.

Summary by CodeRabbit

Release Notes

New Features
- Implemented two-phase evaluation system that detects issues in content, then generates targeted suggestions for fixes
- Added automatic retry capability for handling temporary service interruptions
- Suggestions now generated with full document context for improved coherence and quality

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Implements withRetry function for handling transient LLM API failures with configurable retry attempts and detailed logging. Changes: - Add src/evaluators/retry.ts with withRetry function - Accepts operation, maxRetries (default 3), and context string - Logs each retry attempt with [vectorlint] prefix for debugging - Returns RetryResult<T> with data and attempt count - Export from src/evaluators/index.ts for use by phase runners - Add comprehensive unit tests (9 tests, all passing) - Include Property 5 test for retry behavior verification Purpose: Provides foundational retry logic for detection and suggestion phases in the two-phase evaluation architecture. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add runPromptUnstructured method to LLMProvider interface - Implement runPromptUnstructured in OpenAI, Anthropic, Azure OpenAI, and Gemini providers - Returns raw text response instead of structured JSON - Includes debug logging, error handling, and token usage tracking - Add unit tests for unstructured provider methods Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add detection-phase key to src/evaluators/prompts.json with: - Guided format instructions for output with markdown example - Issue N output format (Issue 1, Issue 2, etc.) as section headers - Required fields: quotedText, contextBefore, contextAfter, line, criterionName, analysis - {criteria} placeholder for dynamic criteria insertion - Guidelines for issue ordering, context requirements, and "No issues found" handling Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add DetectionPhaseRunner class as the first phase of the two-phase detection/suggestion architecture. The detection phase identifies issues in content based on evaluation criteria using an unstructured LLM prompt. Changes: - Create src/evaluators/detection-phase.ts with DetectionPhaseRunner - Implement run(content, criteria, options): Promise<DetectionResult> - Build detection prompt with criteria from PromptFile via getPrompt() - Use runPromptUnstructured for LLM call to get free-form text response - Integrate retry logic from retry.ts for transient failure handling - Include basic markdown parser for Issue N sections - Define types: RawDetectionIssue, DetectionResult, DetectionPhaseOptions - Export from src/evaluators/index.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…tests - Fix analysis regex bug in DetectionPhaseRunner.parseIssueSection() - Add comprehensive test suite with 17 tests including Property 2 - Parser correctly extracts all required fields from markdown responses - Gracefully handles malformed sections by skipping them - All tests pass Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add suggestion-phase prompt template to src/evaluators/prompts.json: - Universal template for all rule types (check, judge, style guide) - Placeholders for {content}, {issues}, and {criteria} - Output format with ## Issue N sections matching detection phase - Each suggestion includes actionable fix and explanation - Instructs exactly one suggestion per issue - Guidelines for preserving voice/tone and avoiding over-editing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add buildSuggestionLLMSchema() to src/prompts/schema.ts - Schema defines array of {issueIndex, suggestion, explanation} - Strict mode enabled for structured output validation - Add SuggestionLLMResult TypeScript type - Schema name: vectorlint_suggestion_result Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create SuggestionPhaseRunner class in src/evaluators/suggestion-phase.ts - Implement run(content, issues, criteria) method using runPromptStructured - Add buildPrompt method to format content, issues, and criteria - Add formatIssues method to convert RawDetectionIssue to markdown - Integrate withRetry logic for transient failure handling - Export SuggestionPhaseRunner, Suggestion, SuggestionResult, SuggestionPhaseOptions types - Add comprehensive test suite with 12 tests in tests/suggestion-phase.test.ts - Property 4 tests verify suggestion-to-issue matching by index - Update PRD to mark feature as complete - Update progress.txt with implementation details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implemented ResultAssembler class that combines detection and suggestion phase results into final CheckResult/JudgeResult formats. - assembleCheckResult: Merges detection issues with suggestions for check-style evaluation results - assembleJudgeResult: Groups violations by criterion for judge-style evaluation results - aggregateTokenUsage: Correctly combines token usage from both phases Property 6: Result schema conformance - all tests verify output matches CheckResult and JudgeResult schema requirements Property 7: Token usage aggregation - tests verify correct summation of input/output tokens from both phases All 22 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add DetectionPhaseRunner, SuggestionPhaseRunner, and ResultAssembler to BaseEvaluator - Rewrite runCheckEvaluation to use detection→suggestion→assembly flow - Rewrite runJudgeEvaluation to use detection→suggestion→assembly flow - Pass full document to suggestion phase even when chunking (Property 3) - Add buildCriteriaString helper for prompt criteria formatting - Update ResultAssembler to handle string strictness and PromptCriterionSpec - Add Property 1 and Property 3 tests in base-evaluator-two-phase.test.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix azure-openai-provider.test.ts mock syntax and error expectations - Fix anthropic-provider.test.ts mock variable scope and instanceof checks - Fix openai-provider.test.ts mock variable scope - Update scoring-types.test.ts for two-phase detection/suggestion flow: - Mock both runPromptUnstructured (detection) and runPromptStructured (suggestion) - Fix detection response format to match parser expectations - Update score expectations for new density-based scoring formula - Add beforeEach hook to clear mocks between tests All 394 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Added "Two-Phase Detection/Suggestion Architecture" section explaining all three phases - Documented key components table with file locations and purposes - Listed all 7 property tests that validate the architecture - Updated LLM Provider Methods subsection with both structured and unstrained calls - Updated project structure to include new evaluator files - Marked final PRD task as complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove unused imports from test files (DetectionResult, ResultAssemblerOptions, RetryResult, DefaultRequestBuilder, JudgeLLMResult, CheckLLMResult) - Replace unsafe 'any' type assertions with proper type assertions using 'unknown' intermediate type - Fix variable naming conventions (camelCase to UPPER_CASE for constants) - Fix async arrow function warnings by removing async keyword where await is not used - Add eslint-disable comments for unbound-method warnings on vi.mocked() calls - Add eslint-disable comments for unsafe type assignments that are necessary for test mocking - Remove unused imports from src/evaluators/base-evaluator.ts (unused schema and scoring functions) - Fix variable naming in src/evaluators/result-assembler.ts (final_score to finalScore) All 394 tests still pass. Lint is now clean. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add runtime validation using Zod to ensure LLM responses in the suggestion phase conform to the expected schema, providing an additional layer of safety beyond the structured output schema. Changes: - Add SUGGESTION_LLM_RESULT_SCHEMA Zod schema in src/prompts/schema.ts - Validates suggestions array with issueIndex (positive int), suggestion (non-empty string), and explanation (non-empty string) - Update src/evaluators/suggestion-phase.ts to use Zod validation - Import SUGGESTION_LLM_RESULT_SCHEMA - Parse llmResult.data before using it - Add 4 test cases in tests/suggestion-phase.test.ts for validation: - Missing required field - Wrong type for issueIndex - Invalid value (zero issueIndex) - Empty string for suggestion All 398 tests pass (increased from 394). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update catch block from 'catch {' to 'catch (_e: unknown)' with explanatory comment as per PRD task requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

@example

…ng RetryResult wrapper - Inline buildCheckMessage(), buildCriterionSummary(), buildCriterionReasoning(), normalizeStrictness(), and calculateCriterionScore() in result-assembler.ts - Remove RetryResult<T> wrapper from retry.ts; withRetry() now returns T directly - Update detection-phase.ts and suggestion-phase.ts to use new withRetry signature - Remove @example JSDoc blocks from exported classes (ResultAssembler, DetectionPhaseRunner, SuggestionPhaseRunner) - Update tests/retry.test.ts to use direct return value instead of RetryResult Line count reduced from ~955 to 822 lines (-133 lines, ~14% reduction) All 398 tests pass, build compiles, lint clean. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-13T19:53:19Z

📝 Walkthrough

Walkthrough

This PR implements a two-phase evaluation architecture that replaces single-pass evaluation. It introduces detection (unstructured LLM calls to identify issues), suggestion (structured LLM calls to propose fixes), result assembly, a retry utility with logging, and extends LLM providers with unstructured prompt support. Comprehensive tests validate the new workflow.

Changes

Cohort / File(s)	Change Summary
LLM Provider Extension `src/providers/llm-provider.ts`, `src/providers/openai-provider.ts`, `src/providers/anthropic-provider.ts`, `src/providers/azure-openai-provider.ts`, `src/providers/gemini-provider.ts`	Added `runPromptUnstructured` method to the LLMProvider interface and implemented across all four providers; returns raw text with logging, error handling, and token tracking.
Detection Phase `src/evaluators/detection-phase.ts`	New `DetectionPhaseRunner` class that uses unstructured LLM calls to identify issues; includes markdown parser for structured issue extraction (quotedText, contextBefore/After, line, criterionName, analysis).
Suggestion Phase `src/evaluators/suggestion-phase.ts`	New `SuggestionPhaseRunner` class that uses structured LLM calls with JSON schema to generate per-issue suggestions based on detection results and full document context.
Result Assembly `src/evaluators/result-assembler.ts`	New `ResultAssembler` class that merges detection and suggestion results into CheckResult/JudgeResult formats; includes token usage aggregation across both phases.
Supporting Infrastructure `src/evaluators/retry.ts`, `src/prompts/schema.ts`, `src/evaluators/prompts.json`, `src/evaluators/index.ts`	Added `withRetry` utility for transient LLM failures; new `SUGGESTION_LLM_RESULT_SCHEMA` for Zod validation; new detection-phase and suggestion-phase prompt templates; updated evaluators index exports.
Core Integration `src/evaluators/base-evaluator.ts`	Refactored to orchestrate two-phase flow: detection on chunks, suggestion on full document, assembly of final results; replaced per-chunk scoring with unified pipeline.
Documentation & Config `AGENTS.md`, `.gitignore`	Documented three-phase evaluation architecture and component roles; uncommitted .kiro/ directory.
Test Coverage `tests/detection-phase.test.ts`, `tests/suggestion-phase.test.ts`, `tests/result-assembler.test.ts`, `tests/retry.test.ts`, `tests/base-evaluator-two-phase.test.ts`, `tests/*-provider.test.ts`, `tests/scoring-types.test.ts`	Comprehensive test suites added/updated for all new components; provider tests extended for unstructured flow validation.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Evaluator (BaseEvaluator)
    participant DetectionRunner as DetectionPhaseRunner
    participant LLM as LLM Provider
    participant SuggestionRunner as SuggestionPhaseRunner
    participant Assembler as ResultAssembler

    Client->>DetectionRunner: run(content chunk, criteria)
    loop For each content chunk
        DetectionRunner->>LLM: runPromptUnstructured(detection prompt)
        LLM-->>DetectionRunner: raw text response + usage
    end
    DetectionRunner-->>Client: RawDetectionIssue[], TokenUsage

    alt Issues detected
        Client->>SuggestionRunner: run(full document, issues, criteria)
        SuggestionRunner->>LLM: runPromptStructured(suggestion prompt)
        LLM-->>SuggestionRunner: JSON suggestions + usage
        SuggestionRunner-->>Client: Suggestion[], TokenUsage
        
        Client->>Assembler: assembleCheckResult/assembleJudgeResult(issues, suggestions)
        Assembler->>Assembler: mergeIssuesWithSuggestions()
        Assembler->>Assembler: aggregateTokenUsage()
        Assembler-->>Client: CheckResult/JudgeResult with final_score
    else No issues
        Client->>Assembler: assembleCheckResult/assembleJudgeResult(no issues, [])
        Assembler-->>Client: CheckResult/JudgeResult (perfect score)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat(token-usage): Add token usage tracking and cost calculation #40: Both PRs modify the LLM provider interface and token-usage propagation across provider implementations.
feat: implement content chunking and dedicated scoring logic for evaluators #39: Both PRs significantly refactor the evaluator pipeline (BaseEvaluator) and how content chunking/aggregation is handled.
feat: improve Issue Location Accuracy with Line Numbering and Fuzzy Matching #46: Both PRs handle issue-location tracking (quotedText, contextBefore/After, line numbers) with LLM-produced detections.

Suggested reviewers

ayo6706

Poem

🐰 Hop, hop, detection found the flaws so clear,
Then suggestions whisper fixes to your ear,
Two phases dance—unstructured, then structured tight,
Result assembled perfect, shining bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Two-Phase Evaluation Architecture' accurately and concisely summarizes the main architectural change—splitting LLM evaluation into detection and suggestion phases.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In @src/evaluators/result-assembler.ts:
- Around line 206-239: The comment claiming normalized_score maps 1–4 to 1–10 is
incorrect: normalized_score = score * 2.5 yields 2.5–10; update the code in the
result assembly to either (A) correct the comment to state the actual 2.5–10
range for normalized_score, or (B) change the normalization formula to map 1→1
and 4→10 by replacing score * 2.5 with the linear mapping (score - 1) * 3 + 1
(apply this change where normalized_score is computed and keep the comment
consistent).

In @src/evaluators/retry.ts:
- Around line 30-31: Initialize and/or properly handle the case when maxRetries
can be 0 so lastError is never left undefined; either validate maxRetries to be
>= 1 up front or set lastError to a meaningful default Error before the retry
loop (or after the loop check) and then throw that descriptive error instead of
undefined. Update the logic around maxRetries, lastError, and the retry loop in
retry.ts (symbols: maxRetries, lastError, the retry loop) so callers passing
maxRetries: 0 get a clear error message rather than an undefined throw.

In @src/providers/gemini-provider.ts:
- Around line 96-138: The model instance created in the constructor was set with
responseMimeType: "application/json" causing runPromptUnstructured to still
request JSON; update runPromptUnstructured to create or obtain a temporary
unstructured model (e.g., clone or instantiate a new Model/Client without
responseMimeType or with responseMimeType: "text/markdown" or "text/plain") and
call generateContent on that unstructured model instead of this.model; ensure
you reuse the same client/config values (model name, temperature) and clean up
or reuse the temporary model reference as appropriate to avoid changing the
constructor-initialized JSON model used elsewhere.

In @tests/azure-openai-provider.test.ts:
- Around line 362-419: The Gemini provider is logging debug output with
console.error while other providers use console.log; locate the Gemini provider
class (e.g., GeminiProvider) and replace any debug console.error calls in its
debug/logging code paths (methods like runPromptUnstructured /
runPromptStructured or any place using console.error for "[vectorlint]" debug
messages) with console.log, preserving the exact log message and payload shape
so tests and consumers remain consistent.

🧹 Nitpick comments (9)

src/providers/openai-provider.ts (1)
181-182: Consider extracting shared logic to reduce duplication.

The runPromptUnstructured method shares ~80% of its code with runPromptStructured (params building, debug logging, error handling, response validation). Additionally, buildPromptBodyForStructured is used in an unstructured context, which is slightly misleading.

Consider extracting common logic into private helper methods:
private buildBaseParams(content: string, promptText: string): OpenAI.Chat.Completions.ChatCompletionCreateParams { ... }
private handleApiError(e: unknown): never { ... }
private logDebugInfo(params: ..., response?: ...): void { ... }
This would keep the provider thin per coding guidelines while reducing maintenance burden.
src/prompts/schema.ts (1)
97-131: JSON schema and Zod schema have validation inconsistencies.

The JSON schema allows:

issueIndex as any number (including floats and zero)

Empty strings for suggestion and explanation

But the Zod schema requires:

issueIndex to be a positive integer

Non-empty strings (.min(1))

If the LLM returns {"issueIndex": 0, "suggestion": ""}, it passes the JSON schema but fails Zod validation at runtime.
♻️ Align JSON schema with Zod constraints
 properties: {
   issueIndex: {
-    type: "number",
+    type: "integer",
+    minimum: 1,
     description: "The index of the issue this suggestion addresses (1-based, matching Issue 1, Issue 2, etc.)",
   },
   suggestion: {
     type: "string",
+    minLength: 1,
     description: "Specific, actionable text to replace the problematic content",
   },
   explanation: {
     type: "string",
+    minLength: 1,
     description: "Brief explanation of how this suggestion addresses the issue",
   },
 },
Also applies to: 175-183
AGENTS.md (1)
165-165: Capitalize "Markdown" as a proper noun.
-- LLM returns free-form markdown text with `## Issue N` sections
+- LLM returns free-form Markdown text with `## Issue N` sections
tests/suggestion-phase.test.ts (1)

130-154: Consider testing formatIssues through the public interface instead of type assertion.

While testing private methods can be useful, the type assertion pattern (runner as unknown as { formatIssues: ... }) is fragile. If the method name or signature changes, these tests will fail at runtime rather than compile time. Consider either:

Testing this behavior through run() by verifying the prompt content

Making formatIssues protected if it's part of the class's testable contract
src/evaluators/suggestion-phase.ts (1)
149-152: Potential issue: Template replacement may be fragile if content contains placeholder strings.

The simple replace() calls could have unintended effects if the content, issues text, or criteria string contains literal {content}, {issues}, or {criteria} substrings. While unlikely in practice, this could cause partial replacements or double-replacements.

Consider using a more robust templating approach or replacing placeholders sequentially in a way that prevents re-matching.
♻️ Optional: Use more unique placeholders or sequential replacement
   private buildPrompt(
     content: string,
     issues: RawDetectionIssue[],
     criteria: string
   ): string {
     const template = getPrompt("suggestion-phase");
 
     // Format issues for inclusion in the prompt
     const issuesText = this.formatIssues(issues);
 
-    return template
-      .replace("{content}", content)
-      .replace("{issues}", issuesText)
-      .replace("{criteria}", criteria);
+    // Replace placeholders in sequence to avoid re-matching
+    let result = template.replace("{criteria}", criteria);
+    result = result.replace("{issues}", issuesText);
+    result = result.replace("{content}", content);
+    return result;
   }
tests/base-evaluator-two-phase.test.ts (2)
92-95: Weak assertion: toBeGreaterThanOrEqual(0) is always true for array length.

The assertion on line 94 doesn't verify meaningful behavior—structuredCalls.length will always be >= 0. Consider removing this assertion or replacing it with a more specific expectation based on whether issues were detected.
♻️ Consider removing or strengthening the assertion
       // The mock LLM's runPromptUnstructured should have been called for detection
       const unstructuredCalls = (mockLLM.runPromptUnstructured as ReturnType<typeof vi.fn>).mock.calls;
       expect(unstructuredCalls.length).toBeGreaterThan(0);
 
-      // Verify structured call for suggestion was also made if issues were found
-      const structuredCalls = (mockLLM.runPromptStructured as ReturnType<typeof vi.fn>).mock.calls;
-      expect(structuredCalls.length).toBeGreaterThanOrEqual(0);
+      // When no issues are detected (default mock returns "No issues found."),
+      // suggestion phase should not be called
+      const structuredCalls = (mockLLM.runPromptStructured as ReturnType<typeof vi.fn>).mock.calls;
+      expect(structuredCalls.length).toBe(0);
460-468: Incomplete mock detection response may not parse correctly.

The detection response on line 464 is missing contextBefore and contextAfter fields that other tests include. If the detection parser requires these fields, this test may not accurately simulate real behavior.
♻️ Consider using full detection response format for consistency
       (mockLLM.runPromptUnstructured as ReturnType<typeof vi.fn>).mockImplementation(
         () => {
           detectionCallCount++;
           return {
-            data: "## Issue 1\n\n**quotedText:**\n> problem\n\n**line:** 42\n\n**criterionName:** Criterion 1\n\n**analysis:**\nIssue",
+            data: `## Issue 1
+
+**quotedText:**
+> problem
+
+**contextBefore:**
+before
+
+**contextAfter:**
+after
+
+**line:** 42
+
+**criterionName:** Criterion 1
+
+**analysis:**
+This is an issue`,
             usage: { inputTokens: 100, outputTokens: 50 },
           };
         }
       );
src/evaluators/result-assembler.ts (2)
57-61: Consider the default value for totalWordCount.

Defaulting totalWordCount to 1 avoids division by zero but will produce a heavily penalized score if the caller forgets to pass the actual word count. A single violation would result in a score calculation of 10 - ((1/1)*100*strictness)*2, which is clamped to 1.

Consider logging a warning or using a more representative default (e.g., 100) to make misconfiguration more obvious during testing.

305-311: Clarify the 1-based indexing convention.

The mapping uses index + 1 to look up suggestions, implying Suggestion.issueIndex is 1-based while the issues array is 0-based. Consider adding a brief comment to make this convention explicit, as it's a key coupling between detection and suggestion phases.
     return issues.map((issue, index) => {
+      // Suggestions use 1-based issueIndex; issues array is 0-based
       const matchingSuggestion = suggestionMap.get(index + 1);

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03ed298 and fd3fe1d.

📒 Files selected for processing (27)

.gitignore
AGENTS.md
ralph/prd.json
ralph/progress.txt
src/evaluators/base-evaluator.ts
src/evaluators/detection-phase.ts
src/evaluators/index.ts
src/evaluators/prompts.json
src/evaluators/result-assembler.ts
src/evaluators/retry.ts
src/evaluators/suggestion-phase.ts
src/prompts/schema.ts
src/providers/anthropic-provider.ts
src/providers/azure-openai-provider.ts
src/providers/gemini-provider.ts
src/providers/llm-provider.ts
src/providers/openai-provider.ts
tests/anthropic-provider.test.ts
tests/azure-openai-provider.test.ts
tests/base-evaluator-two-phase.test.ts
tests/detection-phase.test.ts
tests/gemini-provider.test.ts
tests/openai-provider.test.ts
tests/result-assembler.test.ts
tests/retry.test.ts
tests/scoring-types.test.ts
tests/suggestion-phase.test.ts

🧰 Additional context used

📓 Path-based instructions (3)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

src/evaluators/retry.ts
src/evaluators/index.ts
src/providers/gemini-provider.ts
src/evaluators/suggestion-phase.ts
src/providers/azure-openai-provider.ts
src/providers/llm-provider.ts
src/prompts/schema.ts
src/evaluators/base-evaluator.ts
src/evaluators/result-assembler.ts
src/providers/openai-provider.ts
src/evaluators/detection-phase.ts
src/providers/anthropic-provider.ts

src/providers/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/providers/**/*.ts: Depend on LLMProvider and SearchProvider interfaces; keep providers thin (transport only)
Inject RequestBuilder via provider constructor to avoid coupling

Files:

src/providers/gemini-provider.ts
src/providers/azure-openai-provider.ts
src/providers/llm-provider.ts
src/providers/openai-provider.ts
src/providers/anthropic-provider.ts

tests/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.test.ts: Write tests using Vitest framework with focus on config parsing, file discovery, schema/structured output, and locator
Use dependency injection in tests: mock providers; do not hit network in unit tests

Files:

tests/result-assembler.test.ts
tests/suggestion-phase.test.ts
tests/detection-phase.test.ts
tests/scoring-types.test.ts
tests/retry.test.ts
tests/base-evaluator-two-phase.test.ts
tests/openai-provider.test.ts
tests/azure-openai-provider.test.ts
tests/gemini-provider.test.ts
tests/anthropic-provider.test.ts

🧠 Learnings (5)